Cisco RTMT alert defined
RTMT Alerts Defined The following list describes each preconfigured alert in alphabetical order and its significance, as well as its possible triggering conditions. Four alerts (CodeRedEntry, CodeYellowEntry, ExcessiveVoiceQualityReports, and MaliciousCallTrace) that do not map to a screen in RTMT are also described. CallProcessingNodeCpuPegging This alert means that a node in the cluster that services phones is experiencing high CPU load for a sustained period of time. Check to see if dial tone is delayed when the phone goes off-hook. If you experience delayed dial tone, the high CPU load is affecting your service levels. Use either Virtual Network Computing (VNC) or Remote Desktop Connection to log onto the machine, and use the Windows Task Manager to check every service's CPU utilization. It's important to note the service and open a priority 2 (P2) case with the Cisco Technical Assistance Center (TAC) that indicates a degraded, but working, state. You can also check each process's CPU utilization by selecting Server > CPU&Memory in RTMT, which shows the process and percentage of CPU utilization in a format similar to that supplied by the Windows Task Manager. CodeRedEntry Although not associated with any particular RTMT screen, this alert indicates that CallManager has restarted itself. On receiving this alert, check the Windows Event Log to understand why the restart occurred. CodeYellowEntry Although not associated with any particular RTMT screen, this alert means that CallManager has started to reject calls because of high load. When this alert gets triggered, it's important to look at the system's CPU load and begin troubleshooting. CriticalServiceDown An alert is issued when any of the following services go down: *Cisco Extended Functions (CallBackService.exe) *Cisco CallManager (ccm.exe) *Cisco CDR Insert (InsertCDR.exe) *Cisco CTIManager (CTIManager.exe) *Cisco Database layer Monitor (AuPair.exe) *Cisco IP Voice Media Streaming App (Ipvmsapp.exe) *Cisco Messaging Interface (CiscoMessagingInterface.exe) *Cisco MOH Audio Translator (AudioTranslator.exe) *Cisco RIS Data Collector (RISDC.exe) *Cisco Telephony Call Dispatcher (TcdSrv.exe) *Cisco TFTP (ctftp.exe) *Cisco Tomcat (jk_nt_service.exe Cisco Tomcat) In addition, the following NT services are monitored *DC Directory Server (DCX500.exe) *World Wide Web Publishing Service (inetinfo.exe) *MSSQLServer (sqlservr.exe) *SQLServerAgent (sqlagent.exe) *SNMP Service (snmp.exe) DirectoryConnectionFailed This alert indicates a problem with CallManager accessing the directory. If CallManager cannot access the directory, several applications might fail, including IPCC Express, the Cisco CallManager User Options web page, extension mobility, and software-based IP phones such as SoftPhone and Communicator. When you get this alert, it's a good idea to cross-check it with the CriticalServiceDown alert to see if the Directory service has stopped. DirectoryReplicationFailed This alert is important because it represents a fundamental problem in communication and process between a cluster's nodes. This alert often gets triggered during the upgrade process if consistent passwords are not used or if pieces of the cluster are not upgraded in the proper order. ExcessiveVoiceQualityReports Although not associated with any particular RTMT screen, this alert is triggered when the QRT softkey is pressed ten times in one hour (by default) by any combination of users (not just the same user). You can configure the number of QRT softkey presses and the time period in which they occur. When you receive this alert, take the information contained in the alert and begin troubleshooting. LowCallManagerHeartbeatRate This alert indicates that the CallManager process is not generating and/or responding to heartbeat requests in a timely manner. If the heartbeat rate slows down significantly, this indicates a problem with the Cisco CallManager service; the service might be going down. LowAvailableDiskSpace This is a critical alert to watch for because low disk space can cause directory related functions to cease. These include but are not limited to access to the Cisco CallManager User Options web page, IPCC Express, and IP interactive voice response (IVR). It also can cause serious database problems that make adding phones and making changes pro CallManager is optimized to run from memory. Given this fact, in large deployments the system can run low on memory if not designed properly. It's also quite possible that your requirements push CallManager's overall design limits. Software bugs can also affect memory availability, so this alarm is quite important. If you encounter this alarm, call TAC and open a case. If TAC advises you that the problem is not related to a bug or memory leak, you should revisit your cluster's design and move resources to another server. LowTcdServerHeartbeatRate This alert indicates that the Cisco Telephony Call Dispatcher (TCD) process is not generating and/or responding to heartbeat requests in a timely manner. If the heartbeat rate slows down significantly, this indicates a problem with the TCD service; the service might be going down. LowTFTPServerHeartbeatRate This alert indicates that the TFTP process is not generating and/or responding to heartbeat requests in a timely manner. If the heartbeat rate slows down significantly, this indicates a problem with the Cisco TFTP service; the service might be going down. MaliciousCallTrace Although not associated with any particular RTMT screen, this alert is triggered when a malicious call trace alarm is received from CallManager. When you receive this alert, you should collect the information contained in the alert, contact the party making the report, and work according to your malicious call policy. MediaListExhausted This alert indicates that a conference or transcoding resource has been exhausted. When this happens, check to see if enough conferencing digital signal processors (DSP) are allocated on any Cisco Catalyst 6608 cards or Communication Media Modules (CMM) in your network. It's very possible that as you add users to the network, more conferencing resources will be needed. NOTE MediaListExhausted is a critical alert in centralized call processing environments that require conferencing and transcoding resources at the core (central site). If this alert triggers, it potentially means that users cannot initiate conference calls or receive voice mail if your voice mail service is G.711-only, such as an Octel or SMDI connection. MgcpDChannelOutOfService This counter's triggering should prompt an immediate check of cabling between your minimum point of entry (MPOE) and the extended demarcation point. If the cabling is good, it's a good idea to call your service provider and open a help case. You should also check to see if the D-channel is flapping (going up and down). If the counter fluctuates between 1 and 0, you should call your service provider and open a help case. NonCallProcessingNodeCpuPegging This alert could indicate that a system has processes that are hanging or that a certain process is using all the CPU on the server. When this alarm triggers, be sure to check the Windows Task Manager on the system, and be ready to end the process or reboot the system. You can also check each process's CPU utilization by selecting Server > CPU&Memory in RTMT, which shows the process and percentage of CPU utilization in a format similar to that supplied by the Task Manager. NumberOfRegisteredGatewaysDecreased This is a key statistic to monitor because it can indicate a problem with inbound and outbound Public Switched Telephone Network (PSTN) calls. When the number of registered gateways decreases, either a device has crashed, or a network connectivity problem exists between CallManager and a gateway. Depending on the size of the organization, this can have a major impact. NumberOfRegisteredGatewaysIncreased This alert might trigger when a gateway that has failed comes back online, or if gateways are added to the system. It's important to inform your NOC when you intend to add devices and gateways to the CallManager environment to ensure that the NOC team doesn't spend time troubleshooting problems that don't exist. NumberOfRegisteredMediaDevicesDecreased This alert generally indicates some kind of device failure. If you have media resources in a Catalyst switch or Cisco router, and that device experiences a problem, this alert triggers. When you receive this alert, it's best to contact the network administration team. If all alerts go to a central NOC, it's a simple exercise to correlate the errors. Depending on how you've set up MRGLs in CallManager Administration, it's possible that users will not be able to use conference or cBarge features during the outage. This alert also triggers when devices are taken out of service for maintenance. NumberOfRegisteredMediaDevicesIncreased This alert indicates that either failed devices have been restored to service or that resources have been added to the network. This is more of an information alert. NumberOfRegisteredPhonesDropped A drop in the number of registered phones could indicate a network failure. Although it's true that the network management center should be aware of any outage, a low-to-medium number in this alert could indicate the failure of an edge switch, whereas a large number could indicate a distribution layer switch outage. RouteListExhausted When a route list is exhausted, you might not have enough gateway resources available to service your traffic patterns. If this alert triggers once or twice, generally this is not a cause for concern. However, if this alert starts triggering many times per day, it's time to add more gateway resources. Given that adding resources can't happen overnight because of telephone company and equipment delivery lead times, it's important to continuously check the T1ChannelsActive or PRIChannelsActive counters in the Cisco CallManager object to ensure that you do not have an upward trend. Alternatively, you can configure the alert properties to trigger the alert only if route lists are exhausted a certain number of times in a certain amount of time. Category:Cisco Configs